feat(mtagro): enable muti-rank usage by HaoZeke · Pull Request #4 · metatensor/gromacs

HaoZeke · 2026-01-30T11:43:08Z

Look away until

Or "real domain decomposition"..

Basically the LAMMPS style where the model loads everywhere, computes on every rank.

~~WIP. Needs testing to ensure consistency.~~

All set. Closes #7.

HaoZeke · 2026-02-15T07:37:29Z

This is now consistent, with mpirun -n 1 gmx_mpi mdrun all the way up to -n 12 on the example from https://github.com/HaoZeke/pixi_envs/tree/main/orgs/metatensor/gromacs/mta_test

since GMX_LOG is rank 0 only, and cerr is not allowed as per GROMACS style guides

1. The bonded interaction building (make_bondeds_zone) runs for all zones but won't find any cross-zone bondeds when !hasInterAtomicInteractions() — it's a no-op for the extra zones 2. The exclusion building (make_exclusions_zone) correctly builds exclusion entries for all i-zone atoms, satisfying the pairlist assertion Added nzone_bondeds = std::max(nzone_bondeds, numIZonesForExclusions) to ensure the exclusion building loop covers all i-zones when intermolecularExclusionGroup is present. Without this, 3D DD (e.g., 2x2x2 with 8 ranks) has numIZones=4 but nzone_bondeds=1, so exclusion lists are only built for zone 0 atoms while the nbnxm assertion expects them for zones 0-3.

1. localToModelIndex_ sized to numLocalPlusHalo instead of signal.x_.size() (was OOB write) 2. augmentGhostPairs rewritten to correctly identify halo MTA atoms by iterating localToModelIndex_ from numLocalAtoms_ onward, instead of incorrectly slicing the full coordinate array 3. Shift vector computed as pair.dx() - (positions_[B] - positions_[A]) in model-space, then rounded to integer cell shifts and recomputed from box vectors for consistency 4. Deduplication of pairs using std::set<tuple> to handle overlap between signal pairs and augmented halo-halo pairs 5. Timer instrumentation via MetatomicTimer RAII class around key phases

PicoCentauri · 2026-02-16T12:34:33Z

src/gromacs/applied_forces/metatomic/metatomic_mdmodule.cpp

+            // TODO: For multi-layer GNN models (MACE, NequIP, etc.) the
+            // interaction_range should be n_layers * cutoff so that DD halos
+            // are deep enough for message-passing.  Many models currently


I think there is nothing we can do here. The model has to declare the correct interaction_range property:

https://docs.metatensor.org/metatomic/latest/torch/reference/models/metadata.html#metatomic.torch.ModelCapabilities

We also say that this has to include the message passing layers.

Makes sense, I mean we have max_cutoff stuff

PicoCentauri · 2026-02-16T12:38:28Z

src/gromacs/applied_forces/metatomic/metatomic_forceprovider.cpp

+    // With thread-MPI, each rank is a thread sharing the same process.
+    // PyTorch's internal OpenMP would spawn N threads per rank, causing
+    // massive oversubscription (e.g. 12 ranks × 12 OMP threads = 144
+    // threads on 12 cores).  Force single-threaded torch operations.


I thought we still oversubscribe even with this check?

Sure but that's on the user, without it Pytorch separately tries to use the OMP variable

PicoCentauri · 2026-02-16T12:39:54Z

src/gromacs/applied_forces/metatomic/metatomic_forceprovider.cpp

+        if (fp)
        {
-            data_->dtype = torch::kFloat32;
+            std::fprintf(fp,


Shouldn't this use the gromacs log mechanism instead of a pure fprint ?

So the GROMACS logger only works on the main rank, the docs suggest not using iostreams

Use STL, but do not use iostreams outside of the unit tests. iostreams can have a negative impact on performance compared to other forms of string streams, depending on the use case. Also, they don’t always play well with using C stdio routines at the same time, which are used extensively in the current code. However, since Google tests rely on iostreams, you should use it in the unit test code.

So fprintf seemed like the best choice, it's also used all over the code.

PicoCentauri · 2026-02-16T12:40:50Z

src/gromacs/applied_forces/metatomic/metatomic_timer.h

I like the timer a lot. However you said there is a GROMACS internal timer. The question is if we have to use this to get the code upstream. But maybe we think about this once we start with this step.

Yeah, currently it is env var gated, but maybe they'd prefer us using theirs.

HaoZeke changed the base branch from metatomic to noVesin January 30, 2026 11:44

HaoZeke marked this pull request as ready for review January 30, 2026 11:44

HaoZeke marked this pull request as draft January 30, 2026 11:49

HaoZeke mentioned this pull request Feb 2, 2026

feat(mtagro): remove vesin #3

Merged

Base automatically changed from noVesin to metatomic February 3, 2026 14:40

HaoZeke force-pushed the realDomDec branch 5 times, most recently from 5c0cd25 to d3505c6 Compare February 4, 2026 17:18

HaoZeke marked this pull request as ready for review February 15, 2026 07:36

HaoZeke requested review from Luthaf and PicoCentauri February 15, 2026 07:36

HaoZeke force-pushed the realDomDec branch from 1b5403b to 97cb698 Compare February 15, 2026 14:46

HaoZeke added 15 commits February 15, 2026 15:48

chore(mtagro): start by documenting a bit more

29f4fe2

chore(mtagro): load models on each rank

2f12d37

chore(mtagro): start checking dom-dec

82c8be5

chore(mtagro): even more debugging

05c7e9d

chore(mtagro): switch to fprintf

6ed12a7

since GMX_LOG is rank 0 only, and cerr is not allowed as per GROMACS style guides

chore(mtagro): start checking signal pairs

040e89f

feat(mtagro): try augmenting pairs

a0488fd

chore(mtatimer): initialize

8c774a1

chore(mta): stupidest approach, scatter gather

444cefe

feat(mta): fixup for correct parallel impl

b8289f2

chore(doc): add some more details

95ee966

chore(check): disable check consistency

497ee84

feat(mtagro): as fast as it gets

24e97e9

HaoZeke force-pushed the realDomDec branch from 97cb698 to 24e97e9 Compare February 15, 2026 14:48

HaoZeke added 3 commits February 15, 2026 15:56

chore(mtagro): fix rebase error

9d97912

chore(runpath): rework to use rpath correctly

2793740

feat(tmpi): bring back some semblance of perf

2ecd385

HaoZeke force-pushed the realDomDec branch from ece0663 to 4242375 Compare February 15, 2026 16:55

chore(ci): start testing a CI build

5c7ea86

HaoZeke force-pushed the realDomDec branch from 4242375 to 5c7ea86 Compare February 15, 2026 18:20

PicoCentauri reviewed Feb 16, 2026

View reviewed changes

HaoZeke requested a review from PicoCentauri February 16, 2026 13:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(mtagro): enable muti-rank usage#4

feat(mtagro): enable muti-rank usage#4
HaoZeke wants to merge 19 commits intometatomicfrom
realDomDec

HaoZeke commented Jan 30, 2026 •

edited

Loading

Uh oh!

HaoZeke commented Feb 15, 2026

Uh oh!

PicoCentauri Feb 16, 2026

Uh oh!

HaoZeke Feb 16, 2026

Uh oh!

PicoCentauri Feb 16, 2026

Uh oh!

HaoZeke Feb 16, 2026

Uh oh!

PicoCentauri Feb 16, 2026

Uh oh!

HaoZeke Feb 16, 2026

Uh oh!

PicoCentauri Feb 16, 2026

Uh oh!

HaoZeke Feb 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

HaoZeke commented Jan 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

HaoZeke commented Feb 15, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

HaoZeke commented Jan 30, 2026 •

edited

Loading